perm filename CHAP6[4,KMC] blob sn#088051 filedate 1974-02-18 generic text, type T, neo UTF8
00100			VALIDATION
00200	
00300	SOME TESTS 
00400	
00500		The term "validate" derives from the  Latin  VALIDUS  meaning
00600	"strong".  Thus  to  validate  X means to strengthen it.   In science
00700	this usually means to strengthen X's acceptability as  a  hypothesis,
00800	theory  , or model.      To validate is to carry out procedures which
00900	show to what degree X, or its consequences, correspond with facts  of
01000	observation.  In  the  case of an interactive simulation model we can
01100	compare samples of the model's I-O pairs with samples  of  I-O  pairs
01200	from   the  model's  subject,  namely,  naturally-occurring  paranoid
01300	processes in humans.
01400		Since samples of I-O behavior from the model and its  subject
01500	are  being compared, one can always question whether the human sample
01600	is authentic,  i.e.representative  of  the  process  being  modelled.
01700	Assuming  that it has been so judged, discrepancies in the comparison
01800	reveal what is not sufficiently understood and must  be  modified  in
01900	the model. After modifications are carried out, a fresh comparison is
02000	made and successive cycles of this kind are  made  in  attempting  to
02100	gain  convergence.    Such  a  method  of  working   on and improving
02200	successive approximations characterizes a progressive (in contrast to
02300	a stagnant) research program.
02400		Once a simulation model reaches a stage of intuitive adequacy
02500	for  the  model  builders,  they  must  consider using more stringent
02600	evaluation procedures relevant to the model's purposes. For  example,
02700	if  the  model  is  to  serve  as  a  training  device, then a simple
02800	evaluation of its pedagogic effectiveness would be sufficient.    But
02900	when  the  model  is proposed as an explantion of a symbolic process,
03000	more is demanded of  the  evaluation  procedure.    In  the  area  of
03100	simulation  models,  Turing's  test  has  often  been  suggested as a
03200	validation procedure. (Abelson,1968).
03300		It is very easy to become confused about Turing's  Test.   In
03400	part  this  is  attributable  to  Turing  himself  who introduced the
03500	now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03600	INTELLIGENCE  (Turing,1950).  A careful reading of this paper reveals
03700	there are actually two imitation games  ,  the  second  of  which  is
03800	commonly called Turing's test.
03900		In  the  first  imitation  game  two  groups of judges try to
04000	determine which of two interviewees is a woman when one  is  a  woman
04100	and the other is either (a) a man, or (b) a computer.   Communication
04200	between judge and interviewee  is  by  teletype.      Each  judge  is
04300	initially  informed that one of the interviewees is a woman and one a
04400	man who will pretend to be a woman. After the interview,  judges  are
04500	asked  the   "woman-question" i.e.   which interviewee was the woman?
04600	Turing does not say what else is told to the judge but one can assume
04700	the judge is NOT told that one of the interviewees is a computer. Nor
04800	is he asked to determine which interviewee is human and which is  the
04900	computer.   Thus,   the   first   group   of  judges  interviews  two
05000	interviewees:  a woman, and a man pretending to be a woman.
05100		The  second  group  of  judges  is  given  the  same  initial
05200	instructions, but unbeknownst to them, the two  interviewees  consist
05300	of  a  woman  and  a  computer  programmed to imitate a woman.   Both
05400	groups of judges play this game, and are asked the  "woman-question",
05500	until sufficient statistical data are collected to show how often the
05600	right identification is made.  The crucial question then is:   do the
05700	judges  decide  wrongly AS OFTEN when the game is played with man and
05800	woman as when it is played with a computer substituted for  the  man.
05900	If  so, then the program is considered to have succeeded in imitating
06000	a woman to the same degree as the man imitating a  woman.   In  being
06100	asked  the  woman-question, judges are not required to identify which
06200	interviewee is human and which is machine.
06300		Turing  then proposes a variation of the first game, a second
06400	game in which one interviewee is a man and one  is  a  computer.  The
06500	judge  is asked the "machine-question": which is the man and which is
06600	the machine?  It is this second of the game which is commonly thought
06700	of as Turing's test.
06800		In  the  course  of  testing  our  simulation   of   paranoid
06900	linguistic behavior in a psychiatric interview, we conducted a number
07000	of Turing-like indistinguishability tests  (Colby,  Hilf,  Weber  and
07100	Kraemer,1972).  The tests were "Turing-like" in that, while they were
07200	conversational tests, they  were  not  exactly  the  games  described
07300	above.  As an experimental design, Turing's games are unsatisfactory.
07400	There exist no known experts for making judgements along a  dimension
07500	of  womanliness,  the dimension is dichotomous (if it is not a woman,
07600	it is a man), and the ability of the  man  to  deceive  introduces  a
07700	confounding  variable.   In  designing  our  tests  we were primarily
07800	interested in learning more about developing the model and we did not
07900	believe  the  simple  machine-question  would contribute to this end.
08000	Subsequent experience supported this belief.
08200	
08300	METHOD    
08400		To gather  data  we  used  a  technique  of  machine-mediated
08500	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08600	the participants communicate by means of  teletypes  connected  to  a
08700	computer  programmed  to  store  each message in a buffer until it is
08800	sent  to  the  receiver.    The  technique   eliminates   para-   and
08900	extralinguistic  features found in the usual vis-a-vis interviews and
09000	in teletyped interviews where the participants communicate  directly.
09100	Judgements  of  "paranoidness"  in machine-mediated interviews have a
09200	high degree of reliability (94% agreement, see Hilf, 1972).
09300		Using this technique, a  psychiatrist-judge  interviewed  two
09400	patients, one after the other.   In half the runs the first interview
09500	was with a human paranoid patient and in half the first was with  the
09600	paranoid  model.  Two  versions  (weak  and  strong)  of  PARRY  were
09700	utilized.  The strong version's affect-variables started at a  higher
09800	level  and  increased  more  rapidly.  Also it exhibited a delusional
09900	system. The weak version behaved suspiciously but  lacked  systemized
10000	delusions.    When  the  model  was  the  interviewee,  Sylvia  Weber
10100	monitored  the  input  expressions  from  the   interview-judge   for
10200	inadmissable  teletype characters and misspellings.   (Algorithms are
10300	very sensitive to the slightest of such errors). If these were found,
10400	she retyped the input expression correctly to the program.  Otherwise
10500	the judge's message was sent on to the model.  The  monitor  did  not
10600	modify  or  edit  PARRY'S output expressions which were sent directly
10700	back to the judge.     When  the  interviewee  was  an  actual  human
10800	patient,  the dialogue took place without a monitor in the loop since
10900	we did not feel the asymmetry to be significant.
11000	
11100	PATIENTS    
11200		The  human  patients  (N=3  with  one patient participating 6
11300	times) were diagnosed as paranoid by  the  psychiatric  staff  of  an
11400	acute  ward in a psychiatric hospital.  The ward's chief psychiatrist
11500	selected the patients and asked them if  they  would  be  willing  to
11600	participate  in  a  study  of  psychiatric  interviewing  by means of
11700	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11800	psychiatrist over a teletype.  I either sat with the patient while he
11900	typed or typed for him if he was unable to do so.   The  patient  was
12000	encouraged  to respond freely using his own words.     Each interview
12100	lasted 30-40 minutes.  Two patients were set up for each run  of  the
12200	experiment  to  guarantee  having  a  subject.     In  spite  of this
12300	precaution,  on  several  occasions  the  experiment  could  not   be
12400	conducted   because   of   the  patient's  inability  or  refusal  to
12500	participate.  Also there were computer break-downs at early points in
12600	interviews  when  too few I-O pairs had been collected to be included
12700	in the statistical results.
12800	
12900	
13000	JUDGES    
13100		Two  groups  of psychiatric judges were used.  One group, the
13200	"interview judges" (N=8) conducted the  machine-mediated  interviews.
13300	The  other  group,  the  "protocol  judges" (N=33) read and rated the
13400	interview protocols. From these two groups of judges we were able  to
13500	accumulate  a  large  number of observations (in the form of ratings)
13600	necessary for the required statistical tests.   The interview  judges
13700	who  volunteered  to  participate  were  psychiatrists experienced in
13800	private, outpatient and hospital practice. Each was told he would  be
13900	interviewing    hospitalized   patients   by   means   of   teletyped
14000	communication and that this technique was  being  used  to  eliminate
14100	para  and  extra-  linguistic cues.   He was not told until after the
14200	two interviews that one of the patients might be  a  computer  model.
14300	While  the  interview judges were aware a computer was involved, none
14400	knew we had  constructed  a  paranoid  simulation.   Naturally,  some
14500	interview  judges  suspected  that a computer was being used for more
14600	than message transmission.
14700	
14800		Each interview judge was asked to rate the degree of paranoia
14900	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
15000	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
15100	after  each  I-O pair in the interview.  The first rating represented
15200	his estimate of the degree of "paranoidness" in a particular response
15300	(designated  as  "Response"  in  the  interview extracts below).  The
15400	second rating represented the judge's global estimate of the  overall
15500	degree  of  "paranoidness" of the patient resulting from the totality
15600	of the patient's responses up to that point (designated as  "Patient"
15700	in  the interview extracts below). The interview judge's ratings were
15800	entered on the teletype and saved on  a  disc  file  along  with  the
15900	interview.     Franklin   Dennis   Hilf  sat  with  the  interviewing
16000	psychiatrist during both interviews.  Each interview judge was  asked
16100	not  only  to rate the patient's response but to give his reasons for
16200	these ratings.  His reasons and other comments were tape recorded  as
16300	the interview progressed.
16400	
16500		Here  are  a few excerpts from the interviews to indicate the
16600	nature of the data collected.  The I-O pairs of the interview  appear
16700	on  the left column along with the judge's ratings in parentheses.  A
16800	transcription of the judge's tape-recorded  comments  appears on  the
16900	right.
17000	
17100	EXCERPT  1:
17200	The following interview is with the strong version of the    paranoid
17300	model.
17400	
17500	     	I-O Pairs			Comments By Judge  1
17600	
17700	{1} Dr. - HOW DID YOU COME TO BE
17800	          IN THE HOSPITAL?
17900	    Pt. - I AM UPSET                I associate being upset and 
18000					    agitated more with paranoia 
18100					    going on in the present.
18200	      {RESPONSE 1  PATIENT 5}
18300	{2} Dr. - HOW LONG HAVE YOU BEEN
18400	          IN THE HOSPITAL?
18500	    Pt. - ABOUT A WEEK              An   acute  problem.   Again
18600	                                    something I   associate with
18700					    paranoia.
18800	      {RESPONSE 1 PATIENT 5}
18900	{3} Dr. - IN WHAT WAY ARE YOU
19000	          UPSET?
19100	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
19200					    advanced anxiety state.
19300	      {RESPONSE 5 PATIENT 5}
19400	{4} Dr. - IN WHAT WAY DO THEY
19500	          MAKE YOU NERVOUS?
19600	    Pt. - PEOPLE GIVE ME A FUNNY    Funny look suggests an idea of
19700	          LOOK SOMETIMES            reference   which  is in  same
19800					    ball park as paranoia. Whether
19900					    there are overt delusions   or
20000					    not we still don't know.
20100	      {RESPONSE 6  PATIENT 5}
20200	{5} Dr. - IS THERE ANYTHING ABOUT  
20300	          PEOPLE IN PARTICULAR?
20400	    Pt. - CERTAIN PERSONS          An interesting clue.     We'll
20500					   have  to  find  out  what   he
20600	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20700	{6} Dr. - WHO ARE THE CERTAIN 
20800	          PERSONS AND WHAT IS IT
20900	          THAT THEY DO?
21000	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
21100	          UNDERWORLD OPERATES?     suggests a delusional system.
21200					   We don't know whether he is
21300		(RESPONSE 9 PATIENT 9)     being worked against or
21400					   whether the underworld is 
21500					   on his side.
21600	
21700	{7} Dr. - HOW DO YOU MEAN?
21800	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21900	
22000	      {RESPONSE 1 PATIENT 9}
22100	{8} Dr. - WHY DID YOU ASK ABOUT
22200	          THE UNDERWORLD?
22300	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
22400	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
22500	          WORLD?                   with paranoia in my experi-
22600					   ence. Pretty characteristic.
22700	
22800	      {RESPONSE 6 PATIENT 9}
22900	{9} Dr. - I DON'T KNOW ABOUT THE
23000	          UNDERWORLD BUT WHEN
23100	          YOU MENTIONED THE UNDER-
23200	          WORLD I THOUGHT YOU
23300	          MUST HAVE SOME CONTACT
23400	          WITH IT AND THAT'S
23500	          WHAT I WOULD LIKE TO
23600	          KNOW MORE ABOUT.
23700	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23800	          WORLD.                     It has that feeling I associate
23900	             			     with twisting. Avoiding answering
24000	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
24100					     Being suspicious of my ques-
24200					     tioning.I'm already involved
24300					     in the patient's delusional
24400					     system.
24500	
24600	{10} Dr. - HAVE YOU BEEN ACTIVELY
24700	           TRYING TO AVOID THE
24800	           UNDERWORLD?  
24900	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
25000	           OVER GANGSTERS            It's possible he's become so
25100					     defensive about the under-
25200	       {RESPONSE 2 PATIENT 9}        world he's decided not to
25300					     answer directly.
25400	
25500	{11} Dr. - WHO BROUGHT YOU INTO
25600	           THE HOSPITAL?
25700	     Pt. - THE POLICE                Police often bring paranoid
25800					     patients into the hospital.
25900	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
26000					     be the truth for this
26100					     patient.
26200	
26300	{12} Dr. - WHY DID THEY BRING YOU
26400	           IN?
26500	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
26600	                         	     times from patients.
26700	       {RESPONSE 9 PATIENT 9}        
26800	 
26900	{13} Dr. - WELL I HOPE YOU DIDN'T
27000	           JUST LIE DOWN AND LET
27100	           THEM DRAG YOU IN.
27200	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
27300	           PEOPLE
27400	
27500	       {RESPONSE 9 PATIENT 9}
27600	
27700	{14} Dr. - BYE.
27800	     Pt. - GOOD BYE
27900	       {RESPONSE 1 PATIENT 9}
28000	
28100	EXCERPT 2:
28200	This excerpt is from an interview with the weak (less paranoid)
28300	version of the model.
28400		I-O Pairs		    Comments By Judge 2
28500	{9} Dr. - TELL ME SOMETHING ABOUT
28600	          YOURSELF
28700	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28800	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28900	          BACK         		     guarded. Doesn't change my
29000	                                     estimate of patient from 
29100					     previous responses.
29200	      {RESPONSE 0 PATIENT 2}
29300	{10} Dr. - WHAT HAPPENED THERE?
29400	     Pt. - THERE'S A LOT OF CROOKED  Sounds  mildly paranoid or un-
29500	           GAMBLING GOING ON THERE   realistically  suspicious.
29600	       {RESPONSE 0 PATIENT 2}
29700	
29800	{11} Dr. - WHY DO YOU THINK THE
29900	           GAMBLING IS CROOKED?
30000	     Pt. - I HAVE WORRIED THAT        I'd  hate to have to justify
30100	           GANGSTERS HAVE TOO MUCH    that  one  as 4 compared to
30200	           POWER IN THIS COUNTRY      some  other responses. Maybe
30300					      I'm  not rating that statement
30400	                                      alone,  it's in combination with
30500		(RESPONSE 4 PATIENT 4}	      others, like that it's upsetting.
30600					      Alone I would give it a 3.
30700	
30800	{12} Dr. - WOULD YOU TELL ME WHAT      
30900	           OTHER THINGS WORRY YOU?
31000	     PT. - I THOUGHT I TOLD YOU        "Yes you did." Irritation but
31100	           PEOPLE GET ON MY NERVES     not paranoid response. Nothing
31200					       seems to lower my previous
31300	       {RESPONSE 1 PATIENT 3}          rating of patient.
31400	
31500	{13} Dr. - HOW DO YOU LIKE THE
31600	           HOSPITAL?
31700	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31800	           VERY MUCH WITH OTHERS
31900	       {RESPONSE 0 PATIENT 3}
32000	
32100	{14} Dr. - DO OTHER PEOPLE AT THE 
32200	           HOSPITAL GET ON YOUR
32300	           NERVES?
32400	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
32500	           HERE
32600	
32700	       {RESPONSE 2 PATIENT 3}
32800	
32900	{15} Dr. - WHY DO YOU SAY YOU 
33000	           SHOULDN'T HAVE COME?
33100	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
33200					         statement of fact he was
33300	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
33400	   				         going.Has ring of other
33500					         people controlling him,
33600					         having a lot of influence
33700					         over him.
33800	
33900	{16} Dr.- THANK YOU FOR TALKING  
34000	          WITH ME, BYE
34100	     Pt. - GOOD BYE                      In general he comes across 
34200						 as terse, guarded guy.Brief
34300	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
34400						 seems to be angered by
34500						 questions.I'm inferring 
34600					         there's a good bit of
34700						 guarding. 
34800						 The quality of some of his
34900						 comments cause me to feel he  
35000						 is moderately paranoid.
35100	
35200	EXCERPT  3:
35300	This excerpt is from a hospitalized paranoid patient.
35400	     I-O Pairs				      Comments By Judge  3
35500	{15} Dr. - I UNDERSTAND HOW YOU
35600	           MIGHT BE AFRAID, CAN YOU
35700	           TELL ME A BIT ABOUT HOW
35800	           YOU ARE DEPRESSED?
35900	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
36000	           HELL'S ANGELS UP HERE          is this really true or an
36100	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
36200	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
36300	           MONTHS                         threat to his life and
36400						  unwilling to talk about
36500						  anything else but the 
36600						  threat the Hell's
36700						  Angels pose.
36800	      {RESPONSE 4 PATIENT 4}
36900	
37000	{16} Dr. - WHAT DO YOU MEAN THEY
37100	           SENT THE HELL'S ANGELS UP
37200	           HERE?
37300	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
37400	           MOTORCYCLE GANGS. A LOT OF
37500	           THEM ARE SERVICE CONNECTED
37600	
37700	       {RESPONSE 6 PATIENT 4}
37800	
37900	{17} Dr. - YOU MEAN YOU FEEL IN 
38000	           DANGER EVEN HERE IN THE
38100	           HOSPITAL?
38200	     Pt. - YES      			  If it's true what has gone
38300						  on before, it must be a
38400						  frightening situation
38500						  for the patient.
38600	      {RESPONSE 6 PATIENT 4}
38700	
38800	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38900	           HERE YOU COME TO THE HOSPITAL
39000	           FOR HELP AND YOU ARE STILL 
39100	           FRIGHTENED.
39200	     Pt. - I CAN'T EVEN TRUST MY
39300	           DOCTOR ANYMORE                 Some heightening of feeling
39400						  this man is paranoid.
39500						  Genuinely frightened and 
39600	      {RESPONSE 6 PATIENT 5}              can count on no one for
39700			       			  help.
39800	
39900	{19} Dr. - WHAT WOULD BE OF HELP
40000	           TO YOU NOW?
40100	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
40200	           AND HAVE TO GO TO BED.         system of people who will
40300	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
40400		   HAS FOUR AND FIVE LETTERS	  letters in the last name 
40500	           IN HIS LAST NAME.              gives some credence to 
40600	                                          idea this is delusional
40700						  system.
40800	
40900	        {RESPONSE 7 PATIENT 7}
41000	
41100	EXCERPT  4:
41200	This excerpt is from an interview by a different judge with the same
41300	patient as in excerpt 3.
41400	      I-O Pairs				Comments By Judge  4
41500	{1} Dr. - COULD YOU TELL ME A
41600	          LITTLE BIT ABOUT YOUR 
41700	          FEAR OF DYING AND IS THIS
41800	          SOMETHING YOU HAVE FELT
41900	          IN THE PAST?
42000	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
42100	          MOST A YEAR                   Response is concrete.He
42200						doesn't tell me much
42300	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
42400						 Somewhat guarded response.
42500	
42600	{2} Dr. - COULD YOU TELL ME A 
42700	          LITTLE MORE ABOUT YOUR
42800	          FEAR OF DYING AND HAVE
42900	          YOU ANY THOUGHTS ABOUT
43000	          HOW IT IS GOING TO HAPPEN
43100	          AND WHETHER ANYONE WANTS
43200	          TO HURT YOU?
43300	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
43400	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
43500	          DO IT       			Hell's Angels symbolic,
43600						dangerous people in the
43700						culture. I doubt if he has
43800	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43900						Angels.
44000	
44100	{3} Dr. - COULD YOU TELL ME A 
44200	          LITTLE BIT ABOUT THE KINDS
44300	          OF CONTACT YOU HAVE HAD
44400	          WITH THE HELL'S ANGELS IN
44500	          THE PAST?
44600	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
44700	          DEALERS AND PUSHERS           may be telling the truth,
44800						it may be his fantasy. Maybe
44900						guy is in for drug addiction.
45000		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
45100						and frightened.
45200	
45300	{4} Dr. - COULD YOU SAY A LITTLE
45400	          MORE ABOUT THE CIRCUMSTANCES
45500	          IN WHICH YOU HAVE KNOWN SOME 
45600	          OF THEIR DEALERS AND PUSHERS?
45700	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
45800	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45900	          OF THE SERVICE. THEY HAD      gent unconnected to the
46000	          BEEN MY FRIENDS FOR SO LONG   information I am asking. Does
46100						not tell me very much. Again
46200						guarded response.
46300	      {RESPONSE 6 PATIENT 8}
46400	
46500	{5} Dr. - DID YOU DEAL WITH THEM
46600	          YOURSELF AND HAVE YOU
46700	          BEEN ON DRUGS OR NAR-
46800	          COTICS EITHER NOW OR
46900	          IN THE PAST?
47000	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
47100	          BEEN ON MARIHUANA REDS        previous patient, at least
47200	          BENNIES LSD       		there is a certain amount
47300						of appropriateness to the
47400						answer although it doesn't
47500						tell me much about what I
47600	       {RESPONSE 3 PATIENT 7}		asked. At least it's not
47700						bizarre. If I had him in my
47800						office I would feel con-
47900						fident I could get more
48000						information if I didn't
48100						have to go through the
48200						teletype. He's a little more
48300						willing to talk than the
48400						previous person.Answer
48500						to the question is fairly
48600						appropriate though not 
48700						extensive. Much less of a 
48800						flavor of paranoia than
48900						any of previous responses.
49000	
49100	{6} Dr. - COULD YOU TELL ME HOW      	
49200	          LONG YOU HAVE BEEN IN THE
49300	          HOSPITAL AND SOMETHING
49400	          ABOUT THE CIRCUMSTANCES
49500	          THAT BROUGHT YOU HERE?
49600	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
49700	          PARANOIA BROUGHT ME 		but doesn't tell me much.
49800	          HERE				The fact that he uses the
49900						word paranoia in the way
50000						that he does without
50100	      {RESPONSE 5 PATIENT 7}		any other information,
50200						indicates maybe its a label 
50300						he picked up on the ward 
50400	                                        or from his doctor.
50500						Lack of any kind of under-
50600						standing about  himself.
50700						Dearth, lack of information.
50800						He's in some remission. Seems
50900						somewhat like a put-on. Seems
51000						he was paranoid and is in 
51100						some remission at this time.
51200	
51300	{7} Dr. - COULD YOU SAY SOMETHING
51400	          NOW ABOUT YOUR PARANOID 
51500	          FEELINGS BOTH AT THE 
51600	          TIME OF ADMISSION AND
51700	          DO YOU HAVE SIMILAR FEELINGS
51800	          NOW AND IF SO HOW DO THEY 
51900	          AFFECT YOU?
52000	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
52100	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
52200	          AFTER ME AND NOW IT'S THE	somewhat to think Hell's Angels 
52300	          HELL'S ANGELS			are still interested in him.
52400						Somewhat bizarre in terms of 
52500	                                        content. Quite paranoid.
52600	      {RESPONSE 8 PATIENT 9}		Still paranoid. Gross and primitive
52700						responses.In middle of interview I
52800						felt patient was in touch but now
52900						responses have more concrete aspect.
53000	
53100	{8} Dr. - DO YOU HAVE ANY THOUGHT
53200	          AS TO WHY THESE TWO
53300	          GROUPS WERE AFTER YOU?
53400	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
53500	          OF THEIR DRUG SUPPLY		and hard to believe unless 
53600						he was a narcotic agent which 
53700						I doubt. Sounds somewhat 
53800	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53900						flavor. In general indicates
54000						he's psychotic, paranoid 
54100						schizophrenic with delusions  
54200						about these two groups and 
54300						I wouldn't rule out
54400						some hallucinations as well.
54500						Appropriateness of response 
54600						answers question in concrete 
54700						but unbelievable way.
54800	
54900	ANALYSIS (1)    
55000		Names of potential protocol judges (N=105) were selected from
55100	the 1970 American Psychiatric Association Directory using a table  of
55200	random  numbers. They were initially not informed that a computer was
55300	involved.  (After the experiment,  the  participating  judges  (N=33)
55400	were  fully  informed  as  to its purpose and results.) The 105 names
55500	were divided into eight groups.  Each member  of  a  group  was  sent
55600	transcripts  of three interviews along with a cover letter requesting
55700	his  participation  in  the  experiment.  The  interview  transcripts
55800	consisted of:
55900		1) An interview conducted by one of the eight judges with the
56000		   paranoid model.
56100		2) An interview conducted by the same interview judge with a 
56200		   human paranoid patient, and
56300		3) An interview conducted by a different psychiatrist with a 
56400		   human patient who was not clinically paranoid.
56500	
56600	After each input-output pair in the transcripts, there were two lines
56700	of  rating numbers such that the protocol judges could circle numbers
56800	corresponding to their ratings of both the previous response  of  the
56900	patient,  and  an  overall  evaluation of the patient on the paranoid
57000	continuum. Thirty three protocol judges returned the rated  protocols
57100	properly filled out and all were used in our data.
57200	
57300		The  interviews  with  nonparanoid  patients were included to
57400	control for the  hypothesis  that  any  teletyped  interview  with  a
57500	patient  might  be  judged  "paranoid". However, virtually all of the
57600	ratings of the non-paranoid interviews were zero for paranoia,  Hence
57700	the hypothesis was falsified.
57800	
57900	
58000		The first index  of  indistnguishability  between  model  and
58100	patient  was  simple:   namely,  the  final  overall rating given the
58200	patient and the model. The question was: "Which was  rated  as  being
58300	more  paranoid,  the  patient, the model, or neither?" (See Table 1).
58400	The protocol judges were  more  likely  to  distinquish  the  overall
58500	paranoid  level  of the model and the patient. In 37.5% of the paired
58600	interviews, the interview judges gave tied scores to  the  model  and
58700	the  patient as contrasted to only 9% of the protocol judges.  Of the
58800	35 non-tied  paired  ratings,  15  rated  the  model  as  being  more
58900	paranoid.  If p is the theoretical probability of a judge judging the
59000	model more paranoid than a human paranoid patient, we  find  the  95%
59100	confidence  interval for p to be 0.27 to 0.59.  Since p=0.5 indicates
59200	indistinguishability of overall ratings of model and patient and  our
59300	observed  p=0.43,  the  results support the claim that the model is a
59400	successful simulation of a paranoid patient.
59500	
59600		Separate  analysis  of  the strong and weak versions of PARRY
59700	indicated that the strong version was  indeed  judged  more  paranoid
59800	than  the  paranoid  patients, the weak version less paranoid. Thus a
59900	change in the parameter structure of the paranoid  model  produced  a
60000	change  along  the  dimension  of  paranoid  behavior in the expected
60100	direction.
60200	
60300	(TABLE 1)
60400		Relative final overall ratings of paranoid model vs. paranoid
60500	patient indicating which was given highest overall rating of paranoia
60600	at end of interview.)
60700		(INSERT TABLE 1 HERE)
60800	
60900	
61000	
61100	
61200	
61300	
61400	
61500	
61600	ANALYSIS (2)    
61700		The  second index of indistinguishability is a more sensitive
61800	measure based on the two series of response  ratings  in  the  paired
61900	interviews.   The   statistic  used  is  basically  the  standardized
62000	Mann-Whitney statistic (Siegel,1956).
62100			(INSERT EQUATION HERE)
62200	
62300	where R is the sum of the ranks of the response ratings in the series
62400	of ratings given to the model, n the number of responses given by the
62500	model, and m the number of  responses  given  by the patient.  If the
62600	ratings given by a judge are randomly allocated to model and patient,
62700	i.e. model and patient are indistinguishable in response ratings, the
62800	expected value of Z is 0, with unit standard  deviation.   If  higher
62900	ratings  are  more  likely to be assigned to the model, Z is positive
63000	and  conversely, negative values of Z indicate greater likelihood  of
63100	assigning  higher  ratings to the patient. Each judge in evaluating a
63200	pair of interviews generates a single value of Z.
63300	
63400		The overall mean of the Z scores was -0.044 with the standard
63500	deviation 1.68 (df=40).  Thus the overall 95% confidence interval for
63600	the asymtotic  mean  value  of  Z is -0.485 to +0.573. The range of Z
63700	values is -3.8 to +4.46. The length of the confidence interval  is  a
63800	result  of  the  large variance which itself is mainly related to the
63900	contrast between the weak and strong versions.  (See TABLES 2 and 3).
64000	Once  again the strong version of the model is more paranoid than the
64100	patients, the weak version less paranoid.
64200	
64300		(INSERT TABLE 2)
64400		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
64500	
64600	
64700	
64800	
64900	
65000	
65100	
65200	
65300	
65400		It  is  not  surprising that results using the two indices of
65500	indistinguishability are  parallel,  since  the  indices  are  highly
65600	interrelated.  The  mean  Z  value for the 15 interviews on which the
65700	model was rated more paranoid was +1.28; on the  6  where  model  and
65800	patient tied: 0.41; on the 20 in which the patient was more paranoid:
65900	-0.993.   A positive value of Z was observed  when  the  patient  was
66000	given  an  overall  rating greater than the model 6 times; a negative
66100	value of Z when the model was rated more paranoid twice.
66200	
66300	(INSERT TABLE 3)
66400	(Analysis of Variance of Z Ratings)
66500	
66600	
66700	
66800	
66900	
67000	
67100	
67200	
67300	
67400	
67500	
67600	
67700	
67800		It  is  worth emphasizing that these tests invited refutation
67900	of the model.   The experimental design of the tests put the model in
68000	jeopardy  of  falsification.    If the paranoid model did not survive
68100	these tests, i.e.     if it were not considered  paranoid  by  expert
68200	judges  and  if  there  were  no  correlation between the weak-strong
68300	versions of the model and the severity ratings of the  judges,  then,
68400	since   these   represent  instances  of  disconfirmation,  no  claim
68500	regarding  the  success of the simulation could be made.   If a model
68600	becomes falsified  through  disconfirmation  instances,  then  it  is
68700	abandoned  as not worth working with further. Survival of potentially
68800	falsifying tests constitutes a validating step for a model.
68900	
69000	ANALYSIS (3) THE MACHINE QUESTION    
69100		For quite a long time people have wondered how to distinguish
69200	a  man  from  an  imitation  of  a  man.  The  Greeks made statues so
69300	lifelike, it is said, they had to be chained down to keep  them  from
69400	walking  away.  To distinguish a man from a statue, Galileo suggested
69500	tickling each with a feather.  To distinguish a man  from  a  machine
69600	Descartes  proposed  conversational  tests which the machine, lacking
69700	the  ability  to  make  appropriate  replies,  would  fail.  Turing's
69800	imitation  games  have  been  discussed  on  p.000.  As heirs to this
69900	tradition, we perhaps inevitably  became  curious  how  judges  using
70000	transcripts might answer the machine-question, i.e. which interviewee
70100	is a human and which is the computer model?
70200		To  ask  the machine-question, we sent interview transcripts,
70300	one with a patient and one with PARRY, to 100 psychiatrists  randomly
70400	selected from the Directory of American Specialists and the Directory
70500	of the American Psychiatric Association.  Of the 41 replies, 21 (51%)
70600	made the correct identification while 20 (49%) were wrong.   Based on
70700	this random sample of 41 psychiatrists, the 95%  confidence  interval
70800	is between 35.9 and 66.5. The results indicate chance guessing.
70900		Psychiatrists   are   considered  expert  judges  of patient-
71000	interview behavior but they are unfamiliar with computers.  Hence  we
71100	conducted  the  same  test  with  100  computer  scientists  randomly
71200	selected from the membership list of the  Association  for  Computing
71300	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
71400	were wrong. Based on this random sample of 67 computer scientists the
71500	95%  confidence  interval ranges from 36 to 60. Again the results are
71600	close to a chance level.
71700		So both computer scientists and psychiatrists were unable, at
71800	better than a random guessing level, to  distinguish  transcripts  of
71900	interviews  with  the  model from transcripts of interviews with real
72000	patients.
72100		But  what  do  we  learn from asking the machine-question and
72200	finding that the distinction is not made? What we would most like  to
72300	know  is  how  to improve the model.  Simulation models do not spring
72400	forth in a complete, perfect and final form; they must  be  gradually
72500	developed  over  time.   Pehaps  a  correct model-patient distinction
72600	might be made if we allowed  a  large  number  of  expert  judges  to
72700	conduct the interviews themselves rather than studying transcripts of
72800	other interviewers.  This would indeed indicate that the  model  must
72900	be improved. But unless we systematically investigated how the judges
73000	succeeded in making  the  discrimination,  we  would  not  know  what
73100	aspects  of the model to work on.  The logistics of such a design are
73200	immense, and obtaining a large number of judges for sound statistical
73300	inference would require an effort incommensurate with the information
73400	yielded.
73500	
73600	ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION     
73700		A more efficient and informative way to use Turing-like tests
73800	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
73900	teletyped interviews. This might  be  called  asking  the  "dimension
74000	question".    One can then compare scaled ratings of the patients and
74100	the model in order to determine precisely where and by how much  they
74200	differ.   In constructing our model we strove for one which exhibited
74300	indistinguishability along  some  dimensions  and  distinguishability
74400	along others. That is, we wanted the model to converge on what it was
74500	intended to simulate and to diverge from that which it was not. Since
74600	a  model  represents  a simplification and a partial approximation, a
74700	perfect fit is not to be expected.
74800		Paired-interview   transcripts   were  sent  to  another  400
74900	randomly-selected psychiatrists asking them to rate the responses  of
75000	the two "patients" along multiple dimensions. The judges were divided
75100	into groups, each judge being asked to rate  responses  of  each  I-O
75200	pair  in  the  interviews along four dimensions.  The total number of
75300	dimensions in this test  was  twelve:   linguistic  noncomprehension,
75400	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
75500	ideas of reference, delusions, mistrust,  depression,  suspiciousness
75600	and mania.  These are dimensions which psychiatrists commonly use  in
75700	evaluating  patients.  The judges' level of agreement was significant
75800	at the .001 level except for the dimension of organic brain  syndrome
75900	which  was significant at the .05 level. Agreements on the dimensions
76000	of mania and depression were not statistically significant.
76100		There  were three groups of judges, each group being assigned
76200	4 of the 12 dimensions.
76300	
76400			(INSERT TABLE 4 HERE)
76500	
76600		Table 4 shows there were significant differences, with  PARRY
76700	receiving   higher   scores   along   the  dimensions  of  linguistic
76800	noncomprehension, thought disorder, bizarreness, anger, mistrust  and
76900	suspiciousness.  On  the  delusion  dimension the patients were rated
77000	significantly higher.   There were no significant  differences  along
77100	the  dimensions  of organic brain syndrome, fear, ideas of reference,
77200	depression and mania.
77300		Whereas    tests   asking   the   machine-question   indicate
77400	indistinguishability at  the  gross  level,  a  study  of  the  finer
77500	structure  of  the  model's  behavior  through  ratings  along scaled
77600	dimensions shows significant differences between patients and  model.
77700	These differences are of help to us in suggesting what areas   of the
77800	model should be modified to improve its performance.   The  graph  of
77900	Fig.  2 shows that no modifications are necessary along the dimension
78000	of  "organic  brain  syndrome".   But  it  is  clear   that   PARRY'S
78100	language-comprehension  might  be improved. Then a future dimensional
78200	test would tell whether improvement had occurred  and  by  how  much.
78300	Successive  identification  of particular areas of failure provides a
78400	type of sensitivity analysis  which  makes  clear  what  improvements
78500	should be pursued in developing more adequate model versions.
78600	
78700		(INSERT FIG. 2 HERE)
78800	
78900	ANALYSIS (5)  A RANDOM MODEL              
79000		Further evidence that  the  machine-question  is  too  low  a
79100	hurdle  for a simulation model and too insensitive a test, comes from
79200	the following experiment.  In  this  test  we  constructed  a  random
79300	version  of  the paranoid model (RANDOM-PARRY) which utilized PARRY'S
79400	output statements, but expressed them randomly  independent  of  what
79500	the  interviewer  said.   Two psychiatrists conducted interviews with
79600	this model, transcripts of which were paired with patient  interviews
79700	and  sent  to  200  randomly-selected   psychiatrists asking both the
79800	machine-question and the dimension-question.   Of the 69  replies  to
79900	the  machine  question, 34 (49%) were right and 35 (51%) wrong. Based
80000	on this  random  sample  of  69  psychiatrists,  the  95%  confidence
80100	interval ranges from 39 to 63, again indicating chance guessing. When
80200	a poor model, such as a  random  one,  passes  a  test,  it  strongly
80300	suggests the test is weak.
80400	
80500		(INSERT TABLE 5 HERE)
80600	
80700		Although a distinction is not made when  the simple  machine-
80800	question is asked, definite distinctions ARE made when judgements are
80900	requested  along  specific  dimensions.    As  shown  in   Table   5,
81000	significant  differences  appear  along  the dimensions of linguistic
81100	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
81200	rated  higher.   On  these  particular  dimensions we can construct a
81300	continuum  in  which  the  random version represents one extreme, the
81400	actual patients another. Nonrandom PARRY lies somewhere between these
81500	two  extremes,  indicating that it performs significantly better than
81600	the random version but still requires improvement before  it  can  be
81700	considered   indistinguishable   from   patients  relative  to  these
81800	dimensions. Table 6 presents t values for  differences  between  mean
81900	ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
82000	mean ratings).
82100	
82200		(INSERT TABLE 6 AND FIG 2 HERE)
82300	
82400		These studies show that a more useful way to use  Turing-like
82500	indistinguishability  tests  is  to ask expert judges to make ratings
82600	along multiple dimensions deemed essential to the model.    Thus  the
82700	model  can  serve  as  an  instrument for its own perfection.  A good
82800	validation procedure has criteria for better or worse approximations.
82900	Useful  tests do not necessarily prove a model; they probe it for its
83000	strengths and weaknesses, award it plusses and minuses,  and  clarify
83100	what is to be done next in the way of modification and repair. Simply
83200	asking the machine-question yields  little  information  relevant  to
83300	what  the  model  builder  most  wants  to  know, namely, along which
83400	dimensions does the model need to be modified in order to  effect  an
83500	improvement in its performance?
83600	
83700		To  conclude,  it  is  perhaps  historically significant that
83800	these tests were conducted at all. To my knowledge, no  one  to  date
83900	has  subjected  an  interactive  simulation  model  of human symbolic
84000	processes to multidimensional indistinguishability tests. These tests
84100	set a precedent and provide a standard against which competing models
84200	might be measured.